Theorectical Analysis of Subsequence Time-Series Clustering from a Frequency-Analysis Viewpoint
نویسندگان
چکیده
Although Subsequence Time Series (STS) clustering is one of the most popular pattern discovery techniques from timeseries data, a mathematical methodology for analyzing STS clustering (or pattern discovery from time-series data) has attracted little attention. In the situation, it has had a surprising report [10] that cluster centers obtained using STS clustering closely resemble ”sine waves” with little relation to input time-series data. With this report as a start, establishment of the methodology has been recognized as a significant issue. The contributions of this paper are mainly two folds. 1) We give, for the first time, a theoretical analysis of Subsequence Time Series (STS) clustering from a frequency-analysis viewpoint and identify a mathematical background on which STS clustering generates sine wave patterns. This also gives a novel theoretical analysis methodology for pattern discovery from time-series data, and 2) we propose a clustering algorithm using a phase alignment preprocessing to avoid sine-wave patterns and refer to it as Phase Alignment STS (PA-STS) clustering. PA-STS clustering is the first algorithm, which is based on theoretical analysis, to obtain meaningful clustering results. We present experimental results that show the reliability of the theoretical results and the effectiveness of PA-STS clustering in application to UCR datasets.
منابع مشابه
Why Does Subsequence Time-Series Clustering Produce Sine Waves?
Data mining and machine leaning communities were surprised when Keogh et al. (2003) pointed out that the k-means cluster centers in subsequence time-series clustering become sinusoidal pseudopatterns for almost all kinds of input time-series data. Understanding this mechanism is an important open problem in data mining. Our new theoretical approach (based on spectral clustering and translationa...
متن کاملUseful Clustering Outcomes from Meaningful Time Series Clustering
Clustering time series data using the popular subsequence (STS) technique has been widely used in the data mining and wider communities. Recently the conclusion was made that it is meaningless, based on the findings that it produces (a) clustering outcomes for distinct time series that are not distinguishable from one another, and (b) cluster centroids that are smoothed. More recent work has si...
متن کاملSelective Subsequence Time Series clustering
0950-7051/$ see front matter 2012 Elsevier B.V. A http://dx.doi.org/10.1016/j.knosys.2012.04.022 ⇑ Corresponding author. Tel.: +66 8 9499 9400; fax E-mail addresses: [email protected] (S. Ro chula.ac.th (V. Niennattrakul), [email protected] Subsequence Time Series (STS) Clustering is a time series mining task used to discover clusters of interesting subsequences in time series data...
متن کاملCombination of Transformed-means Clustering and Neural Networks for Short-Term Solar Radiation Forecasting
In order to provide an efficient conversion and utilization of solar power, solar radiation datashould be measured continuously and accurately over the long-term period. However, the measurement ofsolar radiation is not available to all countries in the world due to some technical and fiscal limitations. Hence,several studies were proposed in the literature to find mathematical and physical mod...
متن کاملFuzzy clustering of time series data: A particle swarm optimization approach
With rapid development in information gathering technologies and access to large amounts of data, we always require methods for data analyzing and extracting useful information from large raw dataset and data mining is an important method for solving this problem. Clustering analysis as the most commonly used function of data mining, has attracted many researchers in computer science. Because o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008